Hunting Down Frame Shifts: Ecological Analysis of Diverse Functional Gene Sequences
نویسندگان
چکیده
Functional gene ecological analyses using amplicon sequencing can be challenging as translated sequences are often burdened with shifted reading frames. The aim of this work was to evaluate several bioinformatics tools designed to correct errors which arise during sequencing in an effort to reduce the number of frameshifts (FS). Genes encoding for alpha subunits of biphenyl (bphA) and benzoate (benA) dioxygenases were used as model sequences. FrameBot, a FS correction tool, was able to reduce the number of detected FS to zero. However, up to 44% of sequences were discarded by FrameBot as non-specific targets. Therefore, we proposed a de novo mode of FrameBot for FS correction, which works on a similar basis as common chimera identifying platforms and is not dependent on reference sequences. By nature of FrameBot de novo design, it is crucial to provide it with data as error free as possible. We tested the ability of several publicly available correction tools to decrease the number of errors in the data sets. The combination of maximum expected error filtering and single linkage pre-clustering proved to be the most efficient read processing approach. Applying FrameBot de novo on the processed data enabled analysis of BphA sequences with minimal losses of potentially functional sequences not homologous to those previously known. This experiment also demonstrated the extensive diversity of dioxygenases in soil. A script which performs FrameBot de novo is presented in the supplementary material to the study or available at https://github.com/strejcem/FBdenovo. The tool was also implemented into FunGene Pipeline available at http://fungene.cme.msu.edu/FunGenePipeline/.
منابع مشابه
Exact mapping of prokaryotic gene starts
It is known that while the programs used to find genes in prokaryotic genomes reliably map protein-coding regions, they often fail in the exact determination of gene starts. This problem is further aggravated by sequencing errors, most notably insertions and deletions leading to frame-shifts. Therefore, the exact mapping of gene starts and identification of frame-shifts are important problems o...
متن کاملResearch, part of a Special Feature on Heterogeneity and Resilience of Human-Rangifer Systems: A CircumArctic Synthesis Modeling Regional Dynamics of Human–Rangifer Systems: a Framework for Comparative Analysis
Theoretical models of interaction between wild and domestic reindeer (Rangifer tarandus; caribou in North America) can help explain observed social–ecological dynamics of arctic hunting and husbandry systems. Different modes of hunting and husbandry incorporate strategies to mitigate effects of differing patterns of environmental uncertainty. Simulations of simple models of harvested wild and d...
متن کاملDown-Regulation of the ALS3 Gene as a Consequent Effect of RNA-Mediated Silencing of the EFG1 Gene in Candida albicans
Background: The most important virulence factor which plays a central role in Candida albicans pathogenesis is the ability of this yeast to alternate between unicellular yeast and filamentous hyphal forms. Efg1 protein is thought to be the main positive regulating transcription factor, which is responsible for regulating hyphal-specific gene expression under most conditions. ALS3 is one of the ...
متن کاملPhylogenetic and sequence analysis of the growth hormone gene of two sturgeons, Huso huso and Acipenser Gueldenstaedtii
In this study, the cDNA Growth Hormone (cGH) of the Belugasturgeon (Husohuso) and Russian sturgeon (Acipensergueldenstaedtii) were cloned and sequenced, and phylogenetic relationships were examined using nucleic acid and amino acid sequences. The nucleotide sequence of the Beluga GH has an open reading frame of 645 nucleotides encoding a protein 214 amino acid residues. The signal peptide cleav...
متن کاملThe Investigation of Mutations and Comparison of Leptin Gene Pro-Motor in Najdi Cattle with the Database NCBI Sequences
Objective: Identity the genetic aspects and major gene influence on energy balance, milk production, fertility, food safety and consumer are the recent interests of genetic and breeding researchers. Methods: Najdi Cattle is the most prominent breeds in Khuzestan province. To do this plan in Shoushtar Najdi Cattle Station, blood samples were taken from 15 Najdi Cattles. DNA was extracted from wh...
متن کامل